Switching Between Views Using Natural Gestures

ABSTRACT

The disclosure includes a system and method for switching between video views and data views. The system includes a controller, a view presentation module, a screen detection module and a view switching module. The controller receives data indicating a participant joined a multi-user communication session. The view presentation module presents a video stream on a mobile device associated with the participant. The screen detection module determines an occurrence of a detection trigger event. The controller receives a video frame image responsive to the occurrence of the detection trigger event. The screen detection module detects a data screen in the video frame image. The view switching module switches a view on the mobile device from video view to data view responsive to a natural gesture performed by the participant. The view presentation module presents a data stream associated with the data screen on the mobile device.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of and claims priority to U.S.application Ser. No. 14/019,915, filed Sep. 6, 2013, entitled “SwitchingBetween Views Using Natural Gestures,” which claims priority under 35USC §119(e) to U.S. Application No. 61/825,482, filed May 20, 2013,entitled “Method of Switching between Views in Mobile VideoconferencingUsing Gestures,” each of which is incorporated by reference in itsentirety.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The specification relates to a system and method for switching betweenvideo views and data views. In particular, the specification relates toa system and method for switching between video views and data viewsusing natural gestures in mobile videoconferencing.

2. Description of the Background Art

Existing videoconferencing systems collect and transmit data streamsalong with video and audio streams in a videoconference session. This isbecause in most business meetings, the users expect to not only see eachother, but also to exchange data information, such as documents,presentation slides, handwritten comments, etc. These data streams areusually directly captured from computer screens, separately encoded withspecial coding tools, and displayed side-by-side with the video streamson a remote site.

The explosion of mobile devices drives more and more videoconferencingservice providers to develop mobile applications such as smart phone andtablet applications. These mobile applications make it much easier forthe users to access the videoconferencing service from anywhere usingmobile devices.

However, it becomes a problem to display both a video view and a dataview on the mobile device simultaneously. Due to the limited screen sizeof the mobile device, it is not possible to display both the video viewand the data view at full resolution side-by-side. Currently, thecommonly used method is to use a user interface that displays one viewat the full screen scale while showing only the thumbnail for the otherview. The user interface combines and displays the video view and thedata view together. Such a user interface fails to provide a unifiedexperience by separating the user interface into multiple view modes,and may cause confusion when there is more than one data stream.

SUMMARY OF THE INVENTION

The disclosure includes a system and method for switching between videoviews and data views on a mobile device. In one embodiment, the systemincludes a controller, a view presentation module, a screen detectionmodule and a view switching module. The controller receives dataindicating a participant joins a multi-user communication session. Theview presentation module presents a video stream of the multi-usercommunication session on a mobile device associated with theparticipant. The screen detection module determines an occurrence of adetection trigger event. The controller receives a video frame imagefrom the video stream responsive to the occurrence of the detectiontrigger event. The screen detection module detects a first data screenin the video frame image. The controller receives data describing afirst natural gesture performed on the mobile device. The view switchingmodule switches a view on the mobile device from video view to data viewresponsive to the first natural gesture. The view presentation modulepresents a first data stream associated with the first data screen onthe mobile device.

In another embodiment, a computer-implemented method with the followingsteps is performed. The method receives data indicating that a firstparticipant, a second participant and a third participant joined amulti-user communication session. The method presents a video stream ofa multi-user communication session to a mobile device associated withthe third participant. The method receives a video frame image from thefirst video stream that includes a first device associated with thefirst participant and a second device associated with the secondparticipant. The method detects a first data screen from the firstdevice and a second data screen from the second device in the videoframe image. The method receives data describing a selection of thefirst data screen performed on the mobile device. The method switches aview on the mobile device from video view to a first data view thatcorresponds to the first data screen responsive to the selection. Themethod presents the first data stream on the mobile device.

Other aspects include corresponding methods, systems, apparatuses, andcomputer program products for these and other innovative aspects.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is illustrated by way of example, and not by way oflimitation in the figures of the accompanying drawings in which likereference numerals are used to refer to similar elements.

FIG. 1A is a high-level block diagram illustrating one embodiment of asystem for switching between video views and data views.

FIG. 1B is a high-level block diagram illustrating another embodiment ofa system for switching between video views and data views.

FIG. 2 is a block diagram illustrating one embodiment of a participationapplication.

FIG. 3A is a graphic representation illustrating one embodiment of aprocess for performing data screen detection.

FIG. 3B is a graphic representation illustrating one embodiment forswitching between video views and data views on a mobile device usingnatural gestures.

FIG. 4A is a graphic representation of one embodiment of a graphic userinterface illustrating a video view mode on a mobile device.

FIG. 4B is a graphic representation of one embodiment of a graphic userinterface illustrating a data view mode on a mobile device.

FIG. 4C is a graphic representation of one embodiment of a graphic userinterface illustrating an embedded data view mode on a mobile device.

FIG. 5 is a flow diagram illustrating one embodiment of a method forswitching between video views and data views using natural gestures in amulti-user communication session.

FIGS. 6A-6C are flow diagrams illustrating another embodiment of amethod for switching between video views and data views using naturalgestures in a multi-user communication session.

FIG. 7 is a flow diagram illustrating one embodiment of a method forswitching between a video view and one of two different data views usinga selection in a multi-user communication session.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The system described in the disclosure is particularly advantageous innumerous respects. First, the system allows a participant to use naturalgestures such as pinch gestures to switch between video views and dataviews, and is capable of providing consistent and seamless userexperience in multi-user communication sessions including mobilevideoconferencing sessions.

Second, the system is capable of automatically detecting data screens invideo frame images. A participant sees a data screen in a video viewbefore switching to a data view showing the data screen in fullresolution, which allows the participant to understand the relationshipbetween a data stream of the data screen and the video content in thevideo view and therefore avoids confusion when more than one data streamis present. For example, if a speaker in a conference room movesfrequently between two or more data screens such as a projector screenand a whiteboard screen, the system can eliminate a remote viewer'sconfusion between the projector screen and the whiteboard screen whenthe remote viewer frequently switches between the video view and theprojection screen data view or between the video view and the whiteboardscreen data view.

Third, the system supports embedded data streams and is capable ofproviding embedded data streams to users. For example, the system canpresent a data stream of a current meeting in a data view mode, wherethe data stream of the current meeting is a video clip describing ameeting. The video clip is embedded with slides and whiteboard stokeinformation presented in the previous meeting. The system can switchfrom the data view mode to the embedded data view mode to present theembedded slides and whiteboard stroke information to participants of thecurrent meeting in full resolution. The system may have other numerousadvantages.

A system and method for switching between video views and data views isdescribed below. In the following description, for purposes ofexplanation, numerous specific details are set forth in order to providea thorough understanding of the invention. It will be apparent, however,to one skilled in the art that the embodiments can be practiced withoutthese specific details. In other instances, structures and devices areshown in block diagram form in order to avoid obscuring the invention.For example, the invention is described in one embodiment below withreference to mobile devices such as a smart phone and particularsoftware and hardware. However, the description applies to any type ofcomputing device that can receive data and commands, and any peripheraldevices providing services.

Reference in the specification to “one embodiment” or “an embodiment”means that a particular feature, structure, or characteristic describedin connection with the embodiment is included in at least oneembodiment. The appearances of the phrase “in one embodiment” in variousplaces in the specification are not necessarily all referring to thesame embodiment.

Some portions of the detailed descriptions that follow are presented interms of algorithms and symbolic representations of operations on databits within a computer memory. These algorithmic descriptions andrepresentations are the means used by those skilled in the dataprocessing arts to most effectively convey the substance of their workto others skilled in the art. An algorithm is here, and generally,conceived to be a self-consistent sequence of steps leading to a desiredresult. The steps are those requiring physical manipulations of physicalquantities. Usually, though not necessarily, these quantities take theform of electrical or magnetic signals capable of being stored,transferred, combined, compared, and otherwise manipulated. It hasproven convenient at times, principally for reasons of common usage, torefer to these signals as bits, values, elements, symbols, characters,terms, numbers or the like.

It should be borne in mind, however, that all of these and similar termsare to be associated with the appropriate physical quantities and aremerely convenient labels applied to these quantities. Unlessspecifically stated otherwise as apparent from the following discussion,it is appreciated that throughout the description, discussions utilizingterms such as “processing” or “computing” or “calculating” or“determining” or “displaying” or the like, refer to the action andprocesses of a computer system, or similar electronic computing device,that manipulates and transforms data represented as physical(electronic) quantities within the computer system's registers andmemories into other data similarly represented as physical quantitieswithin the computer system memories or registers or other suchinformation storage, transmission or display devices.

The invention also relates to an apparatus for performing the operationsherein. This apparatus may be specially constructed for the requiredpurposes, or it may comprise a general-purpose computer selectivelyactivated or reconfigured by a computer program stored in the computer.Such a computer program may be stored in a computer readable storagemedium, such as, but is not limited to, any type of disk includingfloppy disks, optical disks, CD-ROMs, and magnetic disks, read-onlymemories (ROMs), random access memories (RAMs), EPROMs, EEPROMs,magnetic or optical cards, flash memories including USB keys withnon-volatile memory or any type of media suitable for storing electronicinstructions, each coupled to a computer system bus.

Some embodiments can take the form of an entirely hardware embodiment,an entirely software embodiment or an embodiment containing bothhardware and software elements. A preferred embodiment is implemented insoftware, which includes but is not limited to firmware, residentsoftware, microcode, etc.

Furthermore, some embodiments can take the form of a computer programproduct accessible from a computer-usable or computer-readable mediumproviding program code for use by or in connection with a computer orany instruction execution system. For the purposes of this invention, acomputer-usable or computer readable medium can be any apparatus thatcan contain, store, communicate, propagate, or transport the program foruse by or in connection with the instruction execution system,apparatus, or device.

A data processing system suitable for storing and/or executing programcode will include at least one processor coupled directly or indirectlyto memory elements through a system bus. The memory elements can includelocal memory employed during actual execution of the program code, bulkstorage, and cache memories which provide temporary storage of at leastsome program code in order to reduce the number of times code must beretrieved from bulk storage during execution. Input/output or I/Odevices (including but not limited to keyboards, displays, pointingdevices, etc.) can be coupled to the system either directly or throughintervening I/O controllers.

Network adapters may also be coupled to the system to enable the dataprocessing system to become coupled to other data processing systems orremote printers or storage devices through intervening private or publicnetworks. Modems, cable modem and Ethernet cards are just a few of thecurrently available types of network adapters.

Finally, the algorithms and displays presented herein are not inherentlyrelated to any particular computer or other apparatus. Variousgeneral-purpose systems may be used with programs in accordance with theteachings herein, or it may prove convenient to construct morespecialized apparatus to perform the required method steps. The requiredstructure for a variety of these systems will appear from thedescription below. In addition, the specification is not described withreference to any particular programming language. It will be appreciatedthat a variety of programming languages may be used to implement theteachings of the various embodiments as described herein.

System Overview

FIG. 1A illustrates a block diagram of a system 100 for switchingbetween video views and data views according to one embodiment. Theillustrated system 100 includes a hosting device 101 accessible by ahost 135, a registration server 130, a camera 103, display devices 107 a. . . 107 n and mobile devices 115 a . . . 115 n accessible byparticipants 125 a . . . 125 n. In FIG. 1A and the remaining figures, aletter after a reference number, e.g., “115 a,” represents a referenceto the element having that particular reference number. A referencenumber in the text without a following letter, e.g., “115,” represents ageneral reference to instances of the element bearing that referencenumber. In the illustrated embodiment, these entities of the system 100are communicatively coupled via a network 105.

The network 105 can be a conventional type, wired or wireless, and mayhave numerous different configurations including a star configuration,token ring configuration or other configurations. Furthermore, thenetwork 105 may include a local area network (LAN), a wide area network(WAN) (e.g., the Internet), and/or other interconnected data pathsacross which multiple devices may communicate. In some embodiments, thenetwork 105 may be a peer-to-peer network. The network 105 may also becoupled to or includes portions of a telecommunications network forsending data in a variety of different communication protocols. In someembodiments , the network 105 includes Bluetooth communication networksor a cellular communications network for sending and receiving dataincluding via short messaging service (SMS), multimedia messagingservice (MMS), hypertext transfer protocol (HTTP), direct dataconnection, WAP, email, etc. Although FIG. 1A illustrates one network105 coupled to the mobile devices 115, the hosting device 101 and theregistration server 130, in practice one or more networks 105 can beconnected to these entities.

A hosting environment 137 can be an environment to host a multi-usercommunication session. An example multi-user communication sessionincludes a videoconferencing meeting. In some examples, a hostingenvironment 137 is a room where all the devices within the dashed box inFIG. 1A are visible to users. For example, the hosting environment 137could be a conference room environment including one or more displaydevices 107 and one or more cameras 103 present in the conference room.Example display devices 107 include, but are not limited to, aprojector, an electronic whiteboard, a liquid-crystal display and anyother conventional display devices. In one embodiment, the camera 103 isan advanced videoconferencing camera. Example cameras 103 include, butare not limited to, a high-definition (HD) video camera that captureshigh-resolution videos, a pan-tilt-zoom (PTZ) camera that can bemechanically controlled or a group of cameras that provide multi-view orpanoramic views in the hosting environment 137. Although two displaydevices 107 and one camera 103 are illustrated in FIG. 1A, the hostingenvironment 137 can include one or more display devices 107 and one ormore cameras 103.

The hosting device 101, the display devices 107 a . . . 107 n and thecamera 103 are located within the hosting environment 137. The hostingdevice 101 is communicatively coupled to the display device 107 a viasignal line 116, the display device 107 n via signal line 118 and thecamera 103 via signal line 114. The display device 107 a is optionallycoupled to the registration server 130 via signal line 102; the displaydevice 107 n is optionally coupled to the registration server 130 viasignal line 104; and the camera 103 is optionally coupled to theregistration server 130 via signal line 112.

The hosting device 101 can be a computing device that includes aprocessor and a memory, and is coupled to the network 105 via signalline 131. For example, the hosting device 101 is a hardware server. Inanother example, the hosting device 101 is a laptop computer or adesktop computer. The hosting device 101 is accessed by a host 135, forexample, a user that manages a meeting. The hosting device 101 includesa hosting application 109 and a storage device for storing thepresentations generated by the hosting application 109.

The hosting application 109 includes software for hosting a multi-usercommunication session. For example, the hosting application 109 hosts avideo conferencing meeting that the host 135 manages and one or moreparticipants 125 join using one or more mobile devices 115. In anotherexample, the hosting application 109 generates slides for giving apresentation.

In one embodiment, the hosting application 109 displays data to beshared with other participants 125 on one or more data screens of one ormore display devices 107 in the hosting environment 137. Data to beshared with participants 125 includes, but is not limited to, atext-based document, web page content, presentation slides, video clips,stroke-based handwritten comments and/or other user annotations, etc.The one or more data screens in the hosting environment 137 are visibleto the camera 103. For example, the presentation slides to be sharedwith other remote participants 125 are projected on the wall, where theprojection of the presentation slides (or, at least a predeterminedportion of the projection) is within a field of view of the camera 103.In this case, the camera 103 is capable of capturing the projection ofthe presentation slides in one or more video frame images of a videostream.

In another example, if a user in a conference room writes comments on anelectronic whiteboard, the hosting application 109 can control movementof the camera 103 so that the electronic whiteboard is visible to thecamera 103. In this case, the camera 103 is capable of capturing thecomments shown in the electronic whiteboard in one or more video frameimages of a video stream. The camera 103 captures a video streamincluding video frame images depicting the hosting environment 137,where the video frame images contain data screens of the display devices107 and/or the data screen of the hosting device 101.

In one embodiment, the camera 103 sends the video stream to the hostingdevice 101, causing the hosting device 101 to forward the video streamto one or more of the registration server 130 and the mobile device 115.In another embodiment, the camera 103 sends the video stream directly tothe registration server 130 and/or the mobile device 115 via the network105. In yet another embodiment, the camera 103 sends a latest videoframe image captured by the camera 103 to the registration server 130responsive to an occurrence of a detection trigger event. The detectiontrigger event is described below in more detail with reference to FIG.2.

In one embodiment, the hosting application 109 or the display device 107captures a high quality version of a data stream displayed on a datascreen of the display device 107. This high quality version of the datastream displayed on the data screen is referred to as a data streamassociated with the data screen, which includes a series of data screenimages (e.g., screenshot images) depicting content displayed on the datascreen of the display device 107 over time. A screenshot image of a datascreen depicts content displayed on the data screen at a particularmoment of time. At different moments of time, different screenshotimages of the data screen are captured, which form a data streamassociated with the data screen. In some examples, a screenshot image ofthe data screen may be also referred to as a data frame of the datastream.

In some examples, the hosting application 109 captures a series ofscreenshot images describing a slide presentation in high resolutiondirectly from a presentation computing device. In some additionalexamples, an electronic whiteboard captures original stroke informationdisplayed on the whiteboard screen, and sends screenshot imagesdepicting the original stroke information to the hosting application109.

In one embodiment, the hosting application 109 sends the data streamassociated with the data screen to one or more of the mobile device 115and the registration server 130. In another embodiment, the displaydevice 107 directly sends the data stream including one or more datascreen images to one or more of the mobile device 115 and theregistration server 130. For example, the display device 107periodically sends an up-to-date data screen image to the registrationserver 130.

In one embodiment, the participation application 123 a can be operableon the registration server 130. The registration server 130 includes aprocessor and a memory, and is coupled to the network 105 via signalline 106. The registration server 130 includes a database for storingregistered images. The registration server 130 registers the displaydevice 107 and receives a video feed for a meeting from the camera 103.The video feed includes one or more video frame images. The registrationserver 130 runs image matching algorithms to find a correspondencebetween a latest video frame and a latest screenshot image of a datascreen associated with the display device 107 or the hosting device 101.If a match is found, the matching area is highlighted in the video frameimage and displayed on the mobile device 115. The registration server130 is described below in more detail with reference to FIGS. 2 and 3A.

In another embodiment, the participation application 123 b may be storedon a mobile device 115 a, which is connected to the network 105 viasignal line 108. The mobile device 115 a, 115 n is a computing devicewith limited display space that includes a memory and a processor, forexample a laptop computer, a tablet computer, a mobile telephone, asmartphone, a personal digital assistant (PDA), a mobile email device orother electronic device capable of accessing a network 105. The mobiledevice 115 includes a touch screen for displaying data and receivingnatural gestures from a participant 125. Examples of natural gesturesinclude, but are not limited to, tap, double tap, long press, scroll,pan, flick, two finger tap, pinch open, pinch close, etc.

In the illustrated embodiment, the participant 125 a interacts with themobile device 115 a. The mobile device 115 n is communicatively coupledto the network 105 via signal line 110. The participant 125 n interactswith the mobile device 115 n. The participant 125 can be a remote userparticipating in a multi-user communication session such as avideoconferencing session hosted by the hosting device 101. The mobiledevices 115 a, 115 n in FIG. 1A are used by way of example. While FIG.1A illustrates two mobile devices 115 a and 115 n, the disclosureapplies to a system architecture having one or more mobile devices 115.

In one embodiment, the participation application 123 is distributed suchthat it may be stored in part on the mobile device 115 a, 115 n and inpart on the registration server 130. For example, the participationapplication 123 b on the mobile device 115 acts as a thin-clientapplication that displays the video stream or the data stream while theregistration server 130 performs the screen detection steps. Theparticipation application 123 b on the mobile device 115 a instructs thedisplay to present the video stream or the data stream, for example, byrendering images in a browser. The participation application 123 breceives user input (e.g., natural gestures) from the participant 125and interprets the user input. For example, assume the participationapplication 123 b currently displays the video stream. The participationapplication 123 b receives user input from the participant 125 a tomagnify the screen so much that it overcomes a threshold and theparticipation application 123 b determines that the stream should beswitched to the data stream of the hosting device 101. The participationapplication 123 b sends instructions indicating to switch from the videostream to the data stream to the participation application 123 a on theregistration server 130.

The participation application 123 can be code and routines forparticipating in a multi-user communication session. In one embodiment,the participation application 123 can be implemented using hardwareincluding a field-programmable gate array (FPGA) or anapplication-specific integrated circuit (ASIC). In another embodiment,the participation application 123 can be implemented using a combinationof hardware and software. In various embodiments, the participationapplication 123 may be stored in a combination of the devices andservers, or in one of the devices or servers.

FIG. 1B is another embodiment of a system for switching between videoviews and data views. In this embodiment, there is no hosting device101. Instead, mobile devices 115 can comprise the camera 103 and theparticipation application 123 b. The mobile devices 115 are coupled tothe display devices 107 a, 107 n via signal lines 136, 138,respectively.

The participant 125 can activate the camera 103 on the mobile device 115and point it at the display devices 107 a, 107 n to capture theircontent. The mobile device 115 can transmit the images directly to theregistration server 130 via signal line 154. For example, the images canserve as a query from the mobile device 115. The participationapplication 123 uses the captured images to detect the screen from thevideo view and switches to the data view in response to receivinggestures from the participant 125.

Example Participation Application

Referring now to FIG. 2, an example of the participation application 123is shown in more detail. FIG. 2 is a block diagram of a computing device200 that includes a participation application 123, a processor 235, amemory 237, an input/output device 241, a communication unit 239 and astorage device 243 according to some examples. The components of thecomputing device 200 are communicatively coupled by a bus 220. Theinput/output device 241 is communicatively coupled to the bus 200 viasignal line 242. In some embodiments, the computing device 200 can beone of a mobile device 115 and a registration server 130. For example,in one embodiment, the registration server 130 can include aparticipation application 123 with some of the components describedbelow and the mobile device 115 can include some of the other componentsdescribed below.

The processor 235 includes an arithmetic logic unit, a microprocessor, ageneral purpose controller or some other processor array to performcomputations and provide electronic display signals to a display device.The processor 235 is coupled to the bus 220 for communication with theother components via signal line 222. Processor 235 processes datasignals and may include various computing architectures including acomplex instruction set computer (CISC) architecture, a reducedinstruction set computer (RISC) architecture, or an architectureimplementing a combination of instruction sets. Although FIG. 2 includesa single processor 235, multiple processors 235 may be included. Otherprocessors, operating systems, sensors, displays and physicalconfigurations are possible.

The memory 237 stores instructions and/or data that can be executed bythe processor 235. The memory 237 is coupled to the bus 220 forcommunication with the other components via signal line 224. Theinstructions and/or data may include code for performing the techniquesdescribed herein. The memory 237 may be a dynamic random access memory(DRAM) device, a static random access memory (SRAM) device, flash memoryor some other memory device. In some embodiments, the memory 237 alsoincludes a non-volatile memory or similar permanent storage device andmedia including a hard disk drive, a floppy disk drive, a CD-ROM device,a DVD-ROM device, a DVD-RAM device, a DVD-RW device, a flash memorydevice, or some other mass storage device for storing information on amore permanent basis.

The communication unit 239 transmits and receives data to and from atleast one of the hosting device 101, the mobile device 115 and theregistration server 130 depending upon where the participationapplication 123 is stored. The communication unit 239 is coupled to thebus 220 via signal line 226. In some embodiments, the communication unit239 includes a port for direct physical connection to the network 105 orto another communication channel. For example, the communication unit239 includes a USB, SD, CAT-5 or similar port for wired communicationwith the mobile device 115. In some embodiments, the communication unit239 includes a wireless transceiver for exchanging data with the mobiledevice 115 or other communication channels using one or more wirelesscommunication methods, including IEEE 802.11, IEEE 802.16, BLUETOOTH® oranother suitable wireless communication method.

In some embodiments, the communication unit 239 includes a cellularcommunications transceiver for sending and receiving data over acellular communications network including via short messaging service(SMS), multimedia messaging service (MMS), hypertext transfer protocol(HTTP), direct data connection, WAP, e-mail or another suitable type ofelectronic communication. In some embodiments, the communication unit239 includes a wired port and a wireless transceiver. The communicationunit 239 also provides other conventional connections to the network 105for distribution of files and/or media objects using standard networkprotocols including TCP/IP, HTTP, HTTPS and SMTP, etc.

The storage device 243 can be a non-transitory memory that stores datafor providing the functionality described herein. The storage device 243may be a dynamic random access memory (DRAM) device, a static randomaccess memory (SRAM) device, flash memory or some other memory devices.In some embodiments, the storage device 243 also includes a non-volatilememory or similar permanent storage device and media including a harddisk drive, a floppy disk drive, a CD-ROM device, a DVD-ROM device, aDVD-RAM device, a DVD-RW device, a flash memory device, or some othermass storage device for storing information on a more permanent basis.

In the illustrated embodiment, the storage device 243 is communicativelycoupled to the bus 220 via signal line 228. In one embodiment, thestorage device 243 stores one or more of a video stream including one ormore video frame images, a data stream including one or more data screenimages and one or more detection trigger events, etc. The storage device243 may store other data for providing the functionality describedherein. For example, the storage device 243 could store copies of videoconferencing materials, such as presentations, documents, audio clips,video clips, etc.

In the illustrated embodiment shown in FIG. 2, the participationapplication 123 includes a controller 202, a view presentation module204, a screen detection module 206, a view switching module 208, a userinterface module 210 and an optional camera adjustment module 212. Thecomponents of the participation application 123 are communicativelycoupled via the bus 220. Persons of ordinary skill in the art willrecognize that the components can be stored in part on the mobile device115 and in part on the registration server 130. For example, theparticipation application 123 stored on the registration server 130could include the screen detection module 206 and the participationapplication 123 stored on the mobile device could include the remainingcomponents.

The controller 202 can be software including routines for handlingcommunications between the participation application 123 and othercomponents of the computing device 200. In one embodiment, thecontroller 202 can be a set of instructions executable by the processor235 to provide the functionality described below for handlingcommunications between the participation application 123 and othercomponents of the computing device 200. In another embodiment, thecontroller 202 can be stored in the memory 237 of the computing device200 and can be accessible and executable by the processor 235. In eitherembodiment, the controller 202 can be adapted for cooperation andcommunication with the processor 235 and other components of thecomputing device 200 via signal line 230.

In one embodiment, the controller 202 sends and receives data, via thecommunication unit 239, to and from one or more of the mobile device115, the hosting device 101 and the registration server 130. Forexample, the controller 202 receives, via the communication unit 239,user input from a participant 125 operating on a mobile device 115 andsends the user input to the view switching module 208. In anotherexample, the controller 202 receives graphical data for providing a userinterface to a participant 125 from the user interface module 210 andsends the graphical data to a mobile device 115, causing the mobiledevice 115 to present the user interface to the participant 125.

In one embodiment, the controller 202 receives data from othercomponents of the participation application 123 and stores the data inthe storage device 243. For example, the controller 202 receives datadescribing one or more detection trigger events from the screendetection module 206 and stores the data in the storage device 243. Inanother embodiment, the controller 202 retrieves data from the storagedevice 243 and sends the data to other components of the participationapplication 123. For example, the controller 202 retrieves a data streamfrom the storage device 243 and sends the data stream to the viewpresentation module 204 for presenting the data stream to a participant125.

The view presentation module 204 can be software including routines forpresenting a video view or a data view on a mobile device 115. In oneembodiment, the view presentation module 204 can be a set ofinstructions executable by the processor 235 to provide thefunctionality described below for presenting a data view or a video viewon a mobile device 115. In another embodiment, the view presentationmodule 204 can be stored in the memory 237 of the computing device 200and can be accessible and executable by the processor 235. In eitherembodiment, the view presentation module 204 can be adapted forcooperation and communication with the processor 235 and othercomponents of the computing device 200 via signal line 232.

A video view mode presents video data associated with a multi-usercommunication session to a participant 125. For example, the video viewmode presents the video stream of the other participants in themulti-user communication session to the participant 125 in full screenon the mobile device 115. In another example, the video view modepresents the video stream on the mobile device 115 in full resolution.

In one embodiment, the view presentation module 204 receives dataindicating that a participant 125 joins a multi-user communicationsession from a mobile device 115 associated with the participant 125.Correspondingly, the mobile device 115 is in the video view mode. Theview presentation module 204 receives a video stream including one ormore video frame images from the camera 103 directly or via the hostingdevice 101, and presents the video stream to the participant 125 on adisplay of the mobile device 115.

In some examples, one or more data screens that are in the same hostingenvironment 137 as the camera 103 are captured in the one or more videoframe images of the video stream, and the one or more video frame imagesinclude sub-images depicting the one or more data screens. For example,the one or more video frame images capture at least a portion of a datascreen of a hosting device 101, a portion of a screen projection on thewall and/or a portion of a data screen of an electronic whiteboard. Inanother example, the one or more video frame images capture the fulldata screen of the hosting device 101, the full projection screen on thewall and/or the full data screen of the electronic whiteboard.

A data view mode presents a data stream associated with the multi-usercommunication session to the participant 125. For example, the data viewmode presents a data stream with the slides being presented during themulti-user communication session to the participant 125 in full screenon the mobile device 115. In another example, the data view modepresents the data stream on the mobile device 115 in full resolution.

In one embodiment, the view presentation module 204 receives, from theview switching module 208, an identifier (ID) of a detected data screenand a view switching signal indicating that a view on the mobile device115 should be switched from the video view to the data view. In someembodiments, the view presentation module 204 receives a data streamassociated with the detected data screen directly from the displaydevice 107 associated with the data screen. In some other embodiments,the view presentation module 204 receives the data stream via thehosting device 101. Responsive to receiving the view switching signal,the view presentation module 204 stops presenting the video stream onthe mobile device 115 and starts to present the data stream associatedwith the data screen on the mobile device 115. Persons of ordinary skillin the art will recognize that the sections describing presenting thevideo stream or data stream are meant to represent the view presentationmodule 204 instructing the user interface module 210 to generategraphical data that is sent to the mobile device 115 via thecommunication unit 239 for display.

In some examples, an embedded data stream is included in the datastream. The view presentation module 204 receives, from the viewswitching module 208, a view switching signal instructing the viewpresentation module 204 to switch the view on the mobile device 115 fromthe data view to an embedded data view. The embedded data view modepresents the embedded data stream to the participant 125 in fullresolution or in full screen on the mobile device 115. Responsive toreceiving the view switching signal, the view presentation module 204stops presenting the data stream and starts to present the embedded datastream on the mobile device 115. An embedded data stream can be avideoconferencing meeting, a presentation, a video clip, a textdocument, presentation slides, or other types of data embedded in thedata stream.

If the view presentation module 204 receives a view switching signalinstructing the view presentation module 204 to switch the view from theembedded data view back to the data view from the view switching module208, the view presentation module 204 stops presenting the embedded datastream and starts to present the data stream on the mobile device 115again. In one embodiment, the view presentation module 204 receives aview switching signal instructing the view presentation module 204 toswitch the view on the mobile device 115 from the data view to the videoview from the view switching module 208. Responsive to the viewswitching signal, the view presentation module 204 stops presenting thedata stream and starts to present the video stream on the mobile device115.

The screen detection module 206 can be software including routines forperforming data screen detection in a video frame image. In oneembodiment, the screen detection module 206 can be a set of instructionsexecutable by the processor 235 to provide the functionality describedbelow for performing data screen detection in a video frame image. Inanother embodiment, the screen detection module 206 can be stored in thememory 237 of the computing device 200 and can be accessible andexecutable by the processor 235. In either embodiment, the screendetection module 206 can be adapted for cooperation and communicationwith the processor 235 and other components of the computing device 200via signal line 234.

In one embodiment, the screen detection module 206 registers one or moredisplay devices 107 with the registration server 130. For example, thescreen detection module 206 can record a device identifier, a userassociated with the display device 107, etc. with the display device 107and store the registration information in the storage 243. Each displaydevice 107 sends an updated image of its data screen to the registrationserver 130 periodically. For example, each display device 107 sends itsup-to-date screenshot image to the registration server 130 periodically.In some examples, the display device 107 sends the updated screenshotimages of its data screen to the registration server 130 via the hostingdevice 101.

In one embodiment, the screen detection module 206 detects an occurrenceof a trigger event. For example, the event could be a detection triggerevent that triggers a detection of one or more data screens in a videoframe image. For example, a detection trigger event causes the screendetection module 206 to detect whether the video frame image includes adata screen. Example detection trigger events include, but are notlimited to, motion of the camera 103 (e.g., panning, zooming or tiltingof the camera 103, movement of the camera 103, etc.) and/or motion of anobject in the video frame image (e.g., appearance of a projection on thewall in the video frame image, movement of a whiteboard, etc.). Inanother example, the trigger event could be based on a timer.

Responsive to the occurrence of the detection trigger event, the screendetection module 206 receives a latest video frame image of the videostream from the camera 103 directly or via the hosting device 101. Insome examples, the screen detection module 206 receives the latest videoframe image of the video stream from the mobile device 115 or a videoserver that provides the video stream. The screen detection module 206performs data screen detection in the latest video frame imageresponsive to the occurrence of the detection trigger event. Forexample, the screen detection module 206 determines whether a datascreen appears in the latest video frame image by matching a latestscreenshot image of the data screen with the latest video frame image.

In some examples, for each data screen registered with the registrationserver 130, the screen detection module 206 determines whether asub-image that matches the latest screenshot image of the data screenappears in the latest video frame image. For example, the screendetection module 206 determines whether the latest video frame imageincludes a sub-image that depicts the data screen (e.g., the screendetection module 206 determines whether the data screen is captured bythe latest video frame image). In a further example, the screendetection module 206 runs an image matching algorithm to find thecorrespondence between the latest video frame image and the latestscreenshot image of the data screen. If the screen detection module 206finds a match between the latest video frame image and the latestscreenshot image of the data screen, the screen detection module 206highlights the matching area in the video frame image on the mobiledevice 115. For example, the screen detection module 206 highlights thedetected data screen in the video frame image on the mobile device 115.

In one embodiment, the screen detection module 206 runs the imagematching algorithm in real time. An example image matching algorithmincludes a scale-invariant feature transform (SIFT) algorithm. The SIFTalgorithm extracts feature points of both the latest video frame imageand the latest screenshot image of the data screen, where the featurepoints from both images are matched based on the k-nearest neighbors(KNN), and the random sample consensus (RANSAC) algorithm is used tofind the consensus and to determine the homographic matrix. Additionalinformation about how to use SIFT, KNN and RANSAC for image matching canbe found at Hess, R., An Open-Source SIFT Library, Proceedings of theInternational Conference on Multimedia, October 2010, pp. 1493-96.Persons of ordinary skill in the art will recognize that other imagematching algorithms can be used.

If the screen detection module 206 detects one or more data screensexisting in the video frame image, the screen detection module 206generates a matching result including one or more matches between theone or more data screens and the video frame image. The screen detectionmodule 206 notifies the mobile device 115 of the one or more matches,and establishes a direct connection between the mobile device 115 andeach display device 107 that has one matched data screen. The screendetection module 206 highlights one or more matching areas in the videoframe image, where each matching area corresponds to a position of onedata screen captured in the video frame image. The screen detectionmodule 206 displays the highlighted matching areas on the mobile device115.

In another embodiment, the camera 103 is statically deployed andcaptures one or more data screens in the hosting environment 137, andpositions of the one or more data screens remain unchanged in the videoframe images. The screen detection module 206 can determine existence ofthe one or more data screens based on the static setting in the hostingenvironment 137, and can pre-calibrate positions of the one or more datascreens in the video frame images. The screen detection module 206highlights the one or more data screens in the video frame images at thepre-calibrated positions in the video frame images.

The screen detection module 206 sends one or more screen IDs identifyingthe one or more detected data screens and data describing one or morematching areas in the video frame image to the view switching module208. In another embodiment, the screen detection module 206 sendspre-calibrated positions of one or more data screens to the viewswitching module 208. In yet another embodiment, the screen detectionmodule 206 stores the one or more screen IDs, data describing the one ormore matching areas and/or the pre-calibrated positions in the storage243.

The view switching module 208 can be software including routines forswitching a view on a mobile device 115 between a video view and a dataview. In one embodiment, the view switching module 208 can be a set ofinstructions executable by the processor 235 to provide thefunctionality described below for switching a view on a mobile device115 between a video view and a data view. In another embodiment, theview switching module 208 can be stored in the memory 237 of thecomputing device 200 and can be accessible and executable by theprocessor 235. In either embodiment, the view switching module 208 canbe adapted for cooperation and communication with the processor 235 andother components of the computing device 200 via signal line 236.

In one embodiment, the view switching module 208 receives, from thescreen detection module 206, data describing one or more screen IDsidentifying one or more detected data screens and one or more matchingareas associated with the one or more detected data screens in the videoframe image. In the video view mode, the mobile device 115 presents thevideo stream to the participant 125, with the one or more detected datascreens highlighted in the matching areas of the video frame images. Ifthe participant 125 performs a natural gesture (e.g., a pinch open ordouble tap gesture, etc.) within a highlighted matching area of a datascreen on a touch screen of the mobile device 115, the view switchingmodule 208 interprets the participant's natural gesture as a command toswitch from the video view to the data view. The view switching module208 generates a view switching signal describing the command and sendsthe view switching signal to the view presentation module 204, causingthe view presentation module 204 to present the data view to theparticipant 125. In one embodiment, the view switching module 208interprets the natural gesture as a command to switch from the videoview to the data view if the portion of the data screen detected in thevideo frame image is greater than a predetermined threshold (e.g., amajority portion of the data screen appearing in the video frame image).

For example, the participant 125 can use a natural gesture to zoom intoa data screen detected in the video frame image, so that the video viewpresenting the video frame image scales up accordingly on the touchscreen of the mobile device 115. If the size of the scaled-up datascreen in the video frame image reaches a predetermined threshold, theview switching module 208 automatically switches the view on the mobiledevice 115 from the video view to the data view, causing the viewpresentation module 204 to present the data stream associated with thedetected data screen on the mobile device 115. The mobile device 115switches from the video view mode to the data view mode accordingly. Theparticipant 125 can further perform natural gestures to operate on thedata stream such as zooming into the data stream, copying the datastream, dragging the data stream, etc.

In the data view mode, the mobile device 115 presents the data stream tothe participant 125. If the participant 125 performs a natural gesture(e.g., a pinch close gesture or tapping on an exit icon, etc.) on thedata stream displayed on a touch screen of the mobile device 115, theview switching module 208 interprets the participant's natural gestureas a command to switch from the data view back to the video view. Theview switching module 208 generates a view switching signal describingthe command and sends the view switching signal to the view presentationmodule 204, causing the view presentation module 204 to present thevideo view to the participant 125. Again, the screen detection module206 detects the one or more data screens visible to the camera 103 inthe video frame images, and highlights the one or more data screens inthe video frame images. For example, the participant 125 can use anatural gesture to zoom out the data stream, so that the data viewpresenting the data stream scales down accordingly on the touch screenof the mobile device 115. If the size of the scaled down data streamreaches a predetermined threshold, the view switching module 208automatically switches the view on the mobile device 115 from the dataview to the video view, causing the view presentation module 204 topresent the video stream on the mobile device 115.

In the data view mode, if the presented data stream includes an embeddeddata stream, the participant 125 can use a natural gesture on theembedded data stream. The view switching module 208 interprets thenatural gesture as a command to switch from the data view to theembedded data view. The view switching module 208 generates a viewswitching signal describing the command and sends the view switchingsignal to the view presentation module 204, causing the viewpresentation module 204 to present the embedded data stream to theparticipant 125 in full resolution. The participant 125 may performanother natural gesture to exit from the embedded data view and returnto the data view. For example, if the data stream includes an embeddedvideo, the participant 125 can issue a tap open command on an iconrepresenting the embedded video during the data view mode, causing theview presentation module 204 to present the embedded video in fullscreen on the mobile device 115. After viewing the embedded video, theparticipant 125 can issue a pinch close command to exit from theembedded data view and return to the data view.

The user interface module 210 can be software including routines forgenerating graphical data for providing a user interface. In oneembodiment, the user interface module 210 can be a set of instructionsexecutable by the processor 235 to provide the functionality describedbelow for generating graphical data for providing a user interface. Inanother embodiment, the user interface module 210 can be stored in thememory 237 of the computing device 200 and can be accessible andexecutable by the processor 235. In either embodiment, the userinterface module 210 can be adapted for cooperation and communicationwith the processor 235 and other components of the computing device 200via signal line 238.

In one embodiment, the user interface module 210 receives instructionsfrom the view presentation module 204 to generate graphical data forproviding a user interface to a user such as a host 135 or a participant125. The user interface module 210 sends the graphical data to thehosting device 101 or the mobile device 115, causing the hosting device101 or the mobile device 115 to present the user interface to the user.For example, the user interface module 210 generates graphical data forproviding a user interface that depicts a video stream or a data stream.The user interface module 210 sends the graphical data to the mobiledevice 115, causing the mobile device 115 to present the video stream orthe data stream to the participant 125 via the user interface. In otherembodiments, the user interface module 210 may generate graphical datafor providing other user interfaces to users.

The optional camera adjustment module 212 can be software includingroutines for adjusting a camera 103. In one embodiment, the cameraadjustment module 212 can be a set of instructions executable by theprocessor 235 to provide the functionality described below for adjustinga camera 103. In another embodiment, the camera adjustment module 212can be stored in the memory 237 of the computing device 200 and can beaccessible and executable by the processor 235. In either embodiment,the camera adjustment module 212 can be adapted for cooperation andcommunication with the processor 235 and other components of thecomputing device 200 via signal line 240.

In one embodiment, a participant 125 can use natural gestures tonavigate the camera 103. For example, the participant 125 can perform anatural gesture to change the view angle of the camera 103 via a userinterface shown on the mobile device 115. The camera adjustment module212 receives data describing the participant's natural gesture andinterprets the participant's natural gesture as a command to adjust thecamera 103 such as panning, tilting, zooming in or zooming out thecamera 103. The camera adjustment module 212 adjusts the camera 103according to the participant's natural gesture. Through adjusting thecamera 103, the participant 125 may keep one or more data screens of oneor more display devices 107 within the field of view of the camera 103,so that the camera 103 captures the one or more data screens in thevideo frame images.

An example use of the system described herein includes avideoconferencing scenario, where a first party (e.g., a host 135) is ina conference room equipped with a camera 103 and one or more datascreens, and a second party (e.g., a participant 125) is a remote mobileuser participating the videoconference using a mobile device 115 such asa smart phone or a tablet. After the participant 125 joins thevideoconference, the participation application 123 receives a videostream from the camera 103 and presents the video stream to theparticipant 125 on a touch screen of the mobile device 115. Theparticipation application 123 detects one or more data screens capturedby video frame images. The participant 125 can issue a natural gesturesuch as a pinch open gesture on a detected data screen highlighted inthe video frame images, causing the mobile device 115 to switch fromvideo view to data view. Afterwards, the participation application 123presents a data stream associated with the detected data screen to theparticipant 125 in full resolution. The participant 125 may issueanother natural gesture such as a pinch close gesture to switch from thedata view back to the video view.

Another example use of the system described herein includes a retrievalapplication for retrieving information relevant to an image. Forexample, a user can capture an image of an advertisement (e.g., anadvertisement for a vehicle brand), and instruct the retrievalapplication to retrieve information relevant to the advertisement. Theimage of the advertisement may include a banner and/or a data screenimage showing a commercial video. The retrieval application can instructthe screen detection module 206 to detect the data screen in the imageof the advertisement and to identify a product that matches contentshown in the data screen image. The retrieval application may retrieveinformation relevant to the identified product from one or moredatabases and provide the relevant information to a user. Other exampleuses of the system described herein are possible.

Graphic Representations

FIG. 3A is a graphic representation 300 illustrating one embodiment of aprocess for performing data screen detection. After a participant 125joins a multi-user communication session using the mobile device 115,the camera 103 establishes a video stream connection 302 with the mobiledevice 115. The camera 103 sends a video stream to the mobile device 115via the video stream connection 302, causing the mobile device 115 topresent the video stream to the participant 125 in a video view mode.The display device 107 registers with the registration server 130 andsends updated screenshot images 304 of a data screen associated with thedisplay device 107 to the registration server 130 periodically. In theillustrated example, the display device 107 is an electronic whiteboard.In one embodiment, the registration server 130 detects a detectiontrigger event. For example, the registration server 130 detects motionof the camera 103 such as panning or tilting. The registration server130 receives a latest video frame image 306 from the camera 103responsive to the detection trigger event. In some examples, theregistration server 130 receives the latest video frame image 306 fromthe mobile device 115.

The registration server 130 uses an image-matching method to detectactive data screens dynamically. For example, the registration server130 uses an image matching algorithm to find the correspondence betweenthe latest video frame image 306 and the latest screenshot imagereceived from either the hosting device 101 or the display device 107.If a matching result 308 between the latest video frame image 306 andthe latest screenshot image of the data screen is found, theregistration server 130 notifies the mobile device 115 of the matchingresult 308 and highlights the corresponding data screen in the videoframe images. For example, the registration server 130 uses a box 310 tohighlight a data screen of an electronic whiteboard in the video frameimage. The display device 107 associated with the data screenestablishes a data stream connection 312 with the mobile device 115. Thedisplay device 107 may send a data stream to the mobile device 115 viathe data stream connection 312.

FIG. 3B is a graphic representation 319 illustrating one embodiment forswitching between video views and data views on a mobile device 115using natural gestures. The participation application 123 interpretsnatural gestures from a participant 125 to achieve the seamless userexperience. In this embodiment, the participation application 123captures and transmits a first data stream and a second data streamalong with the video stream captured from the camera 103 to the mobiledevice 115. The first data stream includes the high quality screenshotimages from the hosting device 101 (e.g., a laptop), and the second datastream includes the strokes from the display device 107 (e.g., anelectronic whiteboard). Both data screens (the data screen of the laptopand the data screen of the electronic whiteboard) are visible to thecamera 103. In one embodiment, the screenshot images from the hostingdevice 101 include an embedded image depicting the data screen of thedisplay device 107.

At the beginning, the video view is shown on the mobile device 115 topresent the video frame image 320 to the participant 125. For example,the video frame image 320 is shown in full resolution or on full screenof the mobile device 115. As shown in FIG. 3B, both data screens (thedata screen 324 of the laptop and the data screen 322 of the electronicwhiteboard) are visible in the video frame image 320 displayed on theparticipant's mobile device 115. The participation application 123intelligently detects and notifies the participant 125 of the existenceof data screens in the video frame images. For example, theparticipation application 123 highlights the data screens 322 and 324 inthe video frame image 320.

At phase (1), if the participant 125 tries to get more detail from thelaptop data screen 324, he or she can perform a natural gesture 330 onthe laptop data screen 324 shown in the video frame image 320 to zoominto the laptop data screen 324. An example natural gesture 330 can be apinch or double tap gesture. Responsive to the natural gesture 330, thevideo view on the mobile device 115 scales up. If the size of therecognized laptop data screen 324 reaches a pre-set threshold, themobile device 115 automatically switches from the video view to the dataview. For example, the view on the mobile device 115 switches frompresenting the video frame image 320 in full resolution to presenting ahigh quality screenshot image 326 of the laptop data screen 324 in fullresolution. The participation application 123 interprets any furtherpinch or dragging gestures performed on the screenshot image 326 asoperating on the screenshot image 326 of the laptop data screen 324.

At phase (2), when the participant 125 performs a natural gesture 332such as a pinch gesture on the screenshot image 326 to zoom out of thedata view and the zoom-out scale ratio reaches a pre-set threshold, themobile device 115 switches back to the video view from the data view.Again, the participation application 123 presents the video frame image320 on the mobile device 115 in full resolution, and detects and marksthe visible data screens 322 and 324 in the video frame image 320.

At phase (3), the participant 125 performs a natural gesture 334 such asa dragging gesture on the highlighted data screen 322 in the video frameimage 320, causing the mobile device 115 to enlarge the video viewgreater than a threshold amount, which causes the mobile device 115 toswitch from showing the video frame image 320 in full resolution to thedata view showing a screenshot image 328 of the electronic whiteboard infull resolution. At phase (4), the participant 125 performs a naturalgesture 336 such as a pinch gesture on the screenshot image 328 to zoomout the data view, causing the mobile device 115 to decrease the dataview until a threshold point triggers the mobile device 115 to switchback to the video view from the data view. Again, the mobile device 115presents the video frame image 320 to the participant 125.

FIG. 4A is a graphic representation 400 of one embodiment of a graphicuser interface illustrating a video view on a mobile device 115. Theexample user interface shows a video frame image 402 depicting aconference room. The video frame image 402 depicts a host 135 and a datascreen 404 of the hosting device 101 projected on a wall of theconference room. The data screen 404 includes an embedded data screen406. If the participant 125 performs a natural gesture on the datascreen 404 captured in the video frame image 402, the mobile device 115switches from the video view to the data view shown in FIG. 4B.

FIG. 4B is a graphic representation 420 of one embodiment of a graphicuser interface illustrating a data view on a mobile device 115. In thisexample, a data stream including screenshot images of the data screen404 is presented on the mobile device 115. The data stream is amulti-user communication session including an embedded data stream. Forexample, the data stream is a video clip of another conference withembedded slides. The embedded data screen 406 presenting the embeddedslides is shown in the screenshot image of the data screen 404.

When the participant 125 switches to the data view shown in FIG. 4B fromthe video view shown in FIG. 4A, the data stream including the videoclip starts to play. In one embodiment, the participant 125 may exitfrom the data view shown in FIG. 4B and return to the video view shownin FIG. 4A by performing a natural gesture (e.g., a pinch to closegesture) on the screenshot image of the data screen 404. In oneembodiment, the participant 125 can keep zooming into the data view ifthe video clip includes embedded presentation slides or whiteboardstrokes information. For example, if the participant 125 performs anatural gesture on the embedded data screen 406, the mobile device 115can switch from the data view to an embedded data view shown in FIG. 4Cto present slides embedded in the video clip.

FIG. 4C is a graphic representation 440 of one embodiment of a graphicuser interface illustrating an embedded data view on a mobile device115. In this example, the slides shown in the embedded data screen 406is presented to the participant 125. The participant 125 may exit fromthe embedded data view and return to the data view shown in FIG. 4B byperforming a natural gesture (e.g., a pinch to close gesture) on thescreenshot image of the embedded data screen 406.

Methods

FIG. 5 is a flow diagram illustrating one embodiment of a method 500 forswitching between video views and data views using natural gestures in amulti-user communication session. In one embodiment, the controller 202receives 502 data indicating that a participant 125 joined a multi-usercommunication session from a mobile device 115 associated with theparticipant 125. The view presentation module 204 presents 504 a videostream of the multi-user communication session to the mobile device 115.For example, the view presentation module 204 instructs the userinterface engine 210 to generate graphical data for displaying the videostream. In one embodiment, the screen detection module 206 determines anoccurrence of a detection trigger event. The controller 202 receives 506a video frame image from the video stream responsive to the occurrenceof the detection trigger event. For example, the controller 202 receivesa latest video frame image of the video stream from the camera 103. Thescreen detection module 206 detects 508 a first data screen in the videoframe image. For example, the screen detection module 206 determinesthat the video frame image captures the first data screen.

The controller 202 receives 510 data describing a first natural gestureperformed on the mobile device 115. For example, the controller 202receives data describing a pinch to open gesture performed on the firstdata screen in the video frame image. The view switching module 208switches 512 a view on the mobile device 115 from video view to dataview responsive to the first natural gesture. The view presentationmodule 204 presents 514 a first data stream associated with the firstdata screen on the mobile device 115. In one embodiment, the first datastream includes one or more high-definition screenshot images of thefirst data screen generated by a display device 107 associated with thefirst data screen.

FIGS. 6A-6C are flow diagrams illustrating another embodiment of amethod 600 for switching between video views and data views usingnatural gestures in a multi-user communication session. Referring toFIG. 6A, the controller 202 receives 602 data indicating that aparticipant 125 joined a multi-user communication session from a mobiledevice 115 associated with the participant 125. The view presentationmodule 204 presents 604 a video stream of the multi-user communicationsession on the mobile device 115. The screen detection module 206registers 606 a display device 107 with the registration server 130. Thedisplay device 107 includes a data screen for presenting a data streamof the multi-user communication session in the hosting environment 137.The controller 202 receives 608 images of the data screen from thedisplay device 107 periodically. For example, the controller 202receives screenshot images of the data screen from the display device107 periodically.

The screen detection module 206 detects 610 an occurrence of a detectiontrigger event. The controller 202 receives 612 a latest video frameimage from the camera 103 responsive to the occurrence of the detectiontrigger event. The screen detection module 206 performs 614 data screendetection in the latest video frame image using the latest image of thedata screen received from the display device 107.

Referring to FIG. 6B, the screen detection module 206 determines 616whether a sub-image that matches the latest image of the data screen isfound in the latest video frame image. If the sub-image is found in thelatest video frame image, the method 600 moves to step 618. Otherwise,the method 600 ends. Turning to step 618, the screen detection module206 generates a matching result indicating the match between the latestimage of the data screen and the latest video frame image, and notifiesthe mobile device 115 of the matching result. The screen detectionmodule 206 provides data between the mobile device 115 and the displaydevice 107 associated with the data screen. For example, the screendetection module 206 establishes a direct connection between thedevices. In one embodiment, the display device 107 can transmit a datastream associated with the data screen to the mobile device 115 via thedirect connection.

The controller 202 receives 622 data describing a first natural gestureperformed by the participant 125 on the sub-image depicting the datascreen in the video frame image. The controller 202 receives 624 a datastream associated with the data screen from the display device 107. Theview switching module 208 switches 626 a view on the mobile device 115from video view to data view responsive to the first natural gestureexceeding a threshold. For example, the user makes an expanding viewstarting in the center of the screen and moving over half the width ofthe screen. The view presentation module 204 presents 628 the datastream associated with the data screen on the mobile device 115.

Referring to FIG. 6C, the controller 202 receives 630 data describing asecond natural gesture performed by the participant 125 on the datastream. The view switching module 208 switches 632 the view on themobile device 115 from data view back to video view responsive to thesecond natural gesture exceeding a threshold. The view presentationmodule 204 presents 634 the video stream on the mobile device 115.

FIG. 7 is a flow diagram illustrating one embodiment of a method 700 forswitching between a video view and one of two different data views usinga selection in a multi-user communication session. In one embodiment,the controller 202 receives 702 data indicating that a firstparticipant, a second participant and a third participant 125 joined amulti-user communication session. The view presentation module 204presents 704 a video stream of the multi-user communication session to amobile device 115 associated with the third participant 125. Forexample, the view presentation module 204 instructs the user interfaceengine 210 to generate graphical data for displaying the video stream.In one embodiment, the screen detection module 206 determines anoccurrence of a detection trigger event. The controller 202 receives 706a video frame image from the video stream that includes a first deviceassociated with the first participant and a second device associatedwith the second participant. For example, the controller 202 receives alatest video frame image of the video stream from the camera 103. Thescreen detection module 206 detects 708 a first data screen from thefirst device and a second data screen from the second device in thevideo frame image.

The controller 202 receives 710 data describing a selection of the firstdata screen performed on the mobile device 115. For example, thecontroller 202 receives data describing a finger pressing in the centerof the image of the first device to indicate that the third participantwants to view the first data view. The view switching module 208switches 712 a view on the mobile device 115 from video view to a firstdata view that corresponds to the first data screen responsive to theselection. The view presentation module 204 presents 714 the first datastream on the mobile device 115.

The foregoing description of the embodiments has been presented for thepurposes of illustration and description. It is not intended to beexhaustive or to limit the specification to the precise form disclosed.Many modifications and variations are possible in light of the aboveteaching. It is intended that the scope of the embodiments be limitednot by this detailed description, but rather by the claims of thisapplication. As will be understood by those familiar with the art, theexamples may be embodied in other specific forms without departing fromthe spirit or essential characteristics thereof. Likewise, theparticular naming and division of the modules, routines, features,attributes, methodologies and other aspects are not mandatory orsignificant, and the mechanisms that implement the description or itsfeatures may have different names, divisions and/or formats.Furthermore, as will be apparent to one of ordinary skill in therelevant art, the modules, routines, features, attributes, methodologiesand other aspects of the specification can be implemented as software,hardware, firmware or any combination of the three. Also, wherever acomponent, an example of which is a module, of the specification isimplemented as software, the component can be implemented as astandalone program, as part of a larger program, as a plurality ofseparate programs, as a statically or dynamically linked library, as akernel loadable module, as a device driver, and/or in every and anyother way known now or in the future to those of ordinary skill in theart of computer programming. Additionally, the specification is in noway limited to embodiments in any specific programming language, or forany specific operating system or environment. Accordingly, thedisclosure is intended to be illustrative, but not limiting, of thescope of the specification, which is set forth in the following claims.

What is claimed is:
 1. A computer-implemented method comprising:receiving, with one or more processors, data indicating a participantjoined a multi-user communication session; presenting, with the one ormore processors, a video stream of the multi-user communication sessionon a mobile device associated with the participant; receiving, with theone or more processors, a video frame image from the video streamresponsive to the occurrence of the detection trigger event; detecting,with the one or more processors, a first data screen in the video frameimage; receiving, with the one or more processors, data describing afirst natural gesture performed on the mobile device; switching, withthe one or more processors, a view on the mobile device from video viewto data view responsive to the first natural gesture exceeding athreshold; and presenting, with the one or more processors, a first datastream associated with the first data screen on the mobile device.